Skip to content

BUG: read_excel failed with empty rows after MultiIndex header #40649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 23, 2021

Conversation

ahawryluk
Copy link
Contributor

Prior to this fix, a blank data row after a MultiIndex header was a interpreted as containing a blank index name, but that only works if the user has specified an index column. If index_col is None all subsequent rows should be treated as data, even if the first one is empty.

Copy link
Member

@dsaxton dsaxton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ahawryluk, can you update the title to a more meaningful one?

@dsaxton dsaxton added Bug IO Excel read_excel, to_excel labels Mar 26, 2021
@ahawryluk ahawryluk changed the title Excel 2blanks BUG: read_excel failed with empty rows after MultiIndex header Mar 26, 2021
@ahawryluk
Copy link
Contributor Author

@dsaxton thanks for catching those

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ahawryluk - looks good. A few questions below.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, cc @jreback

@rhshadrach rhshadrach added this to the 1.3 milestone Apr 11, 2021
@@ -554,7 +554,9 @@ def parse(
header_name, _ = pop_header_name(data[row], index_col)
header_names.append(header_name)

has_index_names = is_list_like(header) and len(header) > 1
has_index_names = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here on what the inference rules are

@@ -702,7 +702,7 @@ cdef class TextReader:
ic = (len(self.index_col) if self.index_col
is not None else 0)

if lc != unnamed_count and lc - ic > unnamed_count:
if (lc != unnamed_count and lc - ic > unnamed_count) or ic == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a similar comment to below (e.g. for future readers what are the inference rules)

columns = MultiIndex.from_tuples([("a", "A"), ("b", "B")])
expected = DataFrame(data, columns=columns)
data = "a,b\nA,B\n,\n1,2\n3,4"
result = parser.read_csv(StringIO(data), header=[0, 1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens with index_col not None (do we already tests this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these two tests both have MultiIndex headers and index_col not None:

def test_header_multi_index(all_parsers):

def test_header_multi_index_common_format1(all_parsers, kwargs):

@ahawryluk
Copy link
Contributor Author

@jreback I've added the new comments; let me know if anything else is needed. Thanks

@jreback jreback merged commit 4caf4c7 into pandas-dev:master Apr 23, 2021
@jreback
Copy link
Contributor

jreback commented Apr 23, 2021

thanks @ahawryluk very nice

@ahawryluk ahawryluk deleted the excel_2blanks branch April 23, 2021 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: read_excel fails with multirow headers if headers are followed by two empty rows
4 participants